This report is more a work report than a scientific study with the object in mind to make the analysis, methods and results reproducible.
The prediction indicates a general decline of mammals diversity with regional differentiation. The strongest decline is found in the biodiversity hotspots of Central Africa and South America, while in North Australia, Indonesia, Madagascar the number of species is increasing. The compared predictive models vary in performance. Best Model, in terms of RMSE is ranger followed by neural net, glmnet and finally glm.
The objective is to model the occurrence of mammals species in a climate change scenario and to make predictions for the period year 2050-2079. Further, the report also is used to compare and evaluate the performance of the different statistical models. The general approach is to fit my models with data, evaluate their performance and make predictions. I want to predict the occurrence of the mammals as a species richness index meaning the number of species in a given raster cell of 1°*1° (x, y). This requires some preprocessing of the data. Therefore the report is organised in:
The data can be divided into the tree categories: species distribution data, climate data and land use data. The present species distribution data of 5265 mammals is available as Shapefiles (IUCN Red list of threatened species). The climate data is generated by a simulation of IPSL-CM5A model for the RCP6.0 scenario of the IPCC. The land use data is produced by LPG-GUESS. The simulation is performed by Jörg Steinkamp (Senckenberg, Frankfurt, AG Hickler). The climate data consists of the same 4 variables, once measured than predicted:
The measurements are taken as averages for the period of the years 1971-2000, for every month. The prediction is based on the IPCC Scenario of +6°C for the period the years 2050-2079, again as averages per month. The land use data consists of two data set in .csv format - one for the period year 1971-2000 and another for the period year 2050-2079 predicted data set. Land use is divided into 20 categories with a resolution of o.5° * 0.5°.
First, I need to convert the data into the right format. My analysis is based on a grid with a resolution of 1° * 1° , therefore all my data must refer to this grid and need to be converted.
library(raster)
# create one-degree grid with CRS
grid1 <- raster(nrow=180, ncols=360, xmn=-180, xmx=180, ymn=-90, ymx=90, crs=CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0"))
# save coordinates of grid1 as data.frame
coords1 <- as.data.frame(coordinates(grid1))
# save as R.data
save(grid1, coords1, file = "grid.Rdata")
# remove files to save RAM
rm(grid1, coords1)
I convert the grid from raster to polygons, so each cell is a polygon. Then I use the over() from the sp package with the polygons of each species and the polygons grid to calculate the distribution shape area of each species in each cell. Next, the species are simply counted. This method is pretty inconvenient and time-consuming but I failed to find an easier solution. Because it takes too long I simply load the already processed data and plot an example of the row data with Vulpes vulpes in Figure 1.
# plot row data
library(maptools)
mams <- readShapeSpatial("TERRESTRIAL_MAMMALS/TERRESTRIAL_MAMMALS")
#add geographic coordination system and projection
mams@proj4string <- CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0")
# make a plot with most spread species to illustrate (Vulpes-Fuchs)
# subset
vulpes <- mams[grep(mams@data$binomial[which.max(mams@data$shape_Area)], mams@data$binomial),]
# map with tmap
library("tmap")
vulpes_map <- tm_shape(vulpes, projection = "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0") +
tm_polygons()
tmap_leaflet(vulpes_map)